Active Learning for Building a Corpus of Questions for Parsing
نویسندگان
چکیده
This paper describes how we built a dependency Treebank for questions. The questions for the Treebank were drawn from questions from the TREC 10 QA task and from Yahoo! Answers. Among the uses for the corpus is to train a dependency parser achieving good accuracy on parsing questions without hurting its overall accuracy. We also explore active learning techniques to determine the suitable size for a corpus of questions in order to achieve adequate accuracy while minimizing the annotation efforts.
منابع مشابه
تأثیر ساختواژهها در تجزیه وابستگی زبان فارسی
Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...
متن کاملActive participation of student s in teaching
Active participation of students in teaching is the one of the effective way of learning in science education according to large investigation. By this way students understand current level of knowledge, increasing the student’s eagerness and their attraction for learning. This study was designed to explore the effect of active student involvement in teaching and learning of the bacteriology in...
متن کاملDomain Adaptation by Active Learning
We tackled the Evalita 2011 Domain Adaptation task with a strategy of active learning. The DeSR parser can be configured to provide different measures of perplexity in its own ability to parse sentences correctly. After parsing sentences in the target domain, a small number of the sentences with the highest perplexity were selected, revised manually and added to the training corpus in order to ...
متن کاملActive Learning for Statistical Natural Language Parsing
It is necessary to have a (large) annotated corpus to build a statistical parser. Acquisition of such a corpus is costly and time-consuming. This paper presents a method to reduce this demand using active learning, which selects what samples to annotate, instead of annotating blindly the whole training corpus. Sample selection for annotation is based upon “representativeness” and “usefulness”. ...
متن کاملSample Selection for Statistical Parsing
Corpus-based statistical parsing relies on using large quantities of annotated text as training examples. Building this kind of resource is expensive and labor-intensive. This work proposes to use sample selection to find helpful training examples and reduce human effort spent on annotating less informative ones. We consider several criteria for predicting whether unlabeled data might be a help...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010